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A Robust Statistics Approach 
to Minimum Variance Portfolio Optimization 
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Abstract —We study the design of portfolios under a minimum 
risk criterion. The performance of the optimized portfolio relies 
on the accuracy of the estimated covariance matrix of the 
portfolio asset returns. For large portfolios, the number of 
available market returns is often of similar order to the number 
of assets, so that the sample covariance matrix performs poorly as 
a covariance estimator. Additionally, financial market data often 
contain outliers which, if not correctly handled, may further 
corrupt the covariance estimation. We address these shortcom¬ 
ings by studying the performance of a hybrid covariance matrix 
estimator based on Tyler’s robust M-estimator and on Ledoit- 
Wolf’s shrinkage estimator while assuming samples with heavy¬ 
tailed distribution. Employing recent results from random matrix 
theory, we develop a consistent estimator of (a scaled version 
of) the realized portfolio risk, which is minimized by optimizing 
online the shrinkage Intensity. Our portfolio optimization method 
is shown via simulations to outperform existing methods both for 
synthetic and real market data. 


1. Introduction 

The theory of portfolio optimization is generally associated 
with the classical mean-variance optimization framework of 
Markowitz [1]. The pitfalls of the mean-variance analysis are 
mainly related to its sensitivity to the estimation error of 
the means and covariance matrix of the asset returns. It is 
nonetheless argued that estimates of the covariance matrix 
are more accurate than those of the expected returns [2,3]. 
Thus, many studies concentrate on improving the performance 
of the global minimum variance portfolio (GMVP), which 
provides the lowest possible portfolio risk and involves only 
the covariance matrix estimate. 

The frequently used covariance estimator is the well-known 
sample covariance matrix (SCM). However, covariance esti¬ 
mates for portfolio optimization commonly involve few his¬ 
torical observations of sometimes up to a thousand assets. 
In such a case, the number of independent samples n may 
be small compared to the covariance matrix dimension N, 
which suggests a poor performance of the SCM. The impact 
of the estimation error on the out-of-sample performance of 
the GMVP based on the SCM has already been analyzed in 
[4-7]. 
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In the finance literature, several approaches have been 
proposed to get around the problem of the scarcity of samples. 
One approach is to impose some factor structure on the 
estimator of the covariance matrix [8,9], which reduces the 
number of parameters to be estimated. A second approach is 
to use as a covariance matrix estimator a weighted average of 
the sample covariance matrix and another estimator, such as 
the 1-factor covariance matrix or the identity matrix [10,11]. 
A third approach is a nonlinear shrinkage estimation approach 
[12], which modifies each eigenvalue of the SCM under the 
framework of Markowitz’s portfolio selection. A fourth ap¬ 
proach comprises eigenvalue clipping methods [13-15] whose 
underlying idea is to ‘clean’ the SCM by filtering noisy 
eigenvalues claimed to convey little valuable information. This 
approach has also been employed recently in proposing novel 
vaccine design strategies for infectious diseases [16,17], and 
its theoretical foundations have been examined in [18]. A 
fifth method employs a bootstrap-corrected estimator for the 
optimal return and its asset allocation, which reduces the error 
of over-prediction of the in-sample return by bootstrapping 
[6]. In contrast to all of these methods (which aim to improve 
the covariance matrix estimate), alternative methods have also 
been proposed which directly impose various constraints on 
the portfolio weights, such as a no-shortsale constraint [3], a 
Li norm constraint and a L 2 norm constraint [19,20]. By 
bounding directly the portfolio-weight vector, it is demon¬ 
strated that the estimation error can be reduced, particularly 
when the portfolio size is large [19]. 

In addition to the problem of sample deficiency, it is often 
the case that the return observations exhibit impulsiveness and 
local loss of stationarity [21], which is not addressed by the 
methods mentioned above and leads to performance degrada¬ 
tion. The field of robust estimation [22-25] intends to deal with 
this problem. However, classical robust covariance estimators 
generally require n ^ N and do not perform well (or are 
not even defined) when n ~ N, making them unsuitable 
for many modem applications. Recent works [26-32] based 
on random matrix theory have therefore considered robust 
estimation in the n ~ A regime. Two hybrid robust shrinkage 
covariance matrix estimates have been proposed in parallel in 
[29,30] and in [31], respectively, both of which estimators 
are built upon Tyler’s robust M-estimatior [23] and Ledoit- 
Wolf’s shrinkage approach [11]. In [32], the authors show, by 
means of random matrix theory, that in the large n, N regime 
and under the assumption of elliptical vector observations, the 
estimators in [29,30] and [31] perform essentially the same 
and can be analyzed thanks to their asymptotic closeness to 
well-known random matrix models. Therefore, in this paper. 
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we concentrate on the estimator studied in [29, 30], which we 
denote by Cst (ST standing for shrinkage Tyler). Namely, for 
independent samples xi,...,x„ € with zero mean, Cgx 
is the unique solution to the fixed-point equation 


Cst(p) = (1 - p)- t 


T 


t=i 


^Cs^(p)x* 


pi 


N 


for any p € (max{l — —, 0}, 1]. It should be noted that the 
shrinkage structure even allows n < N. 

This paper designs a novel minimum variance portfolio op¬ 
timization strategy based on CgT with a risk-minimizing (in¬ 
stead of Frobenius norm minimizing [32]) shrinkage parameter 
p. We first characterize the out-of-sample risk of the minimum 
variance portfolio with plug-in ST for all p within a specified 
range. This is done by analyzing the uniform convergence of 
the achieved realized risk on p in the double limit regime, 
where N,n —>■ oo, with cn = N/n —)■ c € (0,oo). We 
subsequently provide a consistent estimator of the realized 
portfolio risk (or, more precisely, a scaled version of it) that 
is defined only in terms of the observed returns. Based on 
this, we obtain a risk-optimized ST covariance estimator by 
optimizing online over p, and thus our optimized portfolio. 

The proposed portfolio selection is shown to achieve supe¬ 
rior performance over the competing methods in [11,31-33] 
in minimizing the realized portfolio risk under the GMVP 
framework for impulsive data. The outperformance of our 
portfolio optimization strategy compared to other methods is 
demonstrated through Monte Carlo simulations with ellipti- 
cally distributed samples, as well as with real data of historical 
(daily) stock returns from Hong Kong’s Hang Seng Index 
(HSI). 

Notations: Boldface upper case letters denote matrices, 
boldface lower case letters denote column vectors, and stan¬ 
dard lower case letters denote scalars. (•)^ denotes transpose. 
In denotes the N x N identity matrix and l^v denotes an N- 
dimensional vector with all entries equal to one. tr[-] denotes 
the matrix trace operator. K and C denote the real and complex 
fields of dimension specified by a superscript. || • || denotes the 
Euclidean norm for vectors and the spectral norm for matrices. 
The Dirac measure at point x is denoted by 5x- The ordered 
eigenvalues of a symmetric matrix X of size N x N sit 
denoted by Ai(X) < ... < A 7 v(X), and the cardinality of a set 
C C M is denoted by \C\. Letting U, V be symmetric N x N 
matrices, we write U V if U — V is positive semidefinite. 


II. Data model and problem formulation 

We consider a time series comprising xi,...,x„ € 
logarithmic returns of N financial assets. We assume the xt 
to be independent and identically distributed (i.i.d.) with 

xt =/r-f f = l,2,...,n, (1) 

where fi € is the mean vector of the asset returns, 
is a real, positive random variable, Cat e is positive 

definite and yt € is a zero mean unitarily invariant random 
vector with norm ||yt||^ = N, independent of the t^’s. It is 
assumed that /i and C m are time-invariant over the observation 
period. Denote zj = C]^^yt. The model (1) for xj embraces 


in particular the class of elliptical distributions, including the 
multivariate normal distribution, exponential distribution and 
the multivariate Student-T distribution as special cases. This 
model for Xj leads to tractable and adoptable design solutions 
and is a commonly used approximation of the impulsive nature 
of financial data [10]. 

Let h G denote the portfolio selection, i.e., the vector 
of asset holdings in units of currency normalized by the total 
outstanding wealth, satisfying = 1. In this paper, short- 

selling is allowed, and thus the portfolio weights may be 
negative. Then the portfolio variance (or risk) over the invest¬ 
ment period of interest is defined as cr^(h) = £^[|h^Xip] = 
h^CAfh [1]. Accordingly, the GMVP selection problem can 
be formulated as the following quadratic optimization problem 
with a linear constraint: 


min cr^(h) s.t. h^l^v = 1- 


This has the well-known solution 

ncMVP - xr'-ii 
N 

and the corresponding portfolio risk is 


(hcMVp) = 


1 




( 2 ) 


Here, (2) represents the theoretical minimum portfolio risk 
bound, attained upon knowing the covariance matrix Cn 
exactly. In practice. Cat is unknown, and instead we form an 
estimate, denoted by Cat. Thus, the GMVP selection based 
on the plug-in estimator Cat is given by 


hcMVP 


Cn^In 


The quality of Iigmvp^ implemented based on the in-sample 
covariance prediction Cat, can be measured by its achieved 
out-of-sample (or “realized”) portfolio risk: 


(^hcMVP^ 




N 


The goal is to construct a good estimator Cat, and conse¬ 
quently hcMVP^ which minimizes this quantity. 

Note that, for the naive uniform diversification rule, h = 
jflN- This is equivalent to setting Cat = Iat, and yields the 
realized portfolio risk: ” " . Interestingly, this extremely 

simple strategy has been shown in [34] to outperform numer¬ 
ous optimized models and will serve as a benchmark in our 
work. 


III. Novel covariance estimator and portfolio 
DESIGN for minimizing RISK 

A. Tyler’s robust M-estimator with linear shrinkage 

Consider the ST covariance matrix estimate introduced in 
[29,30], built upon both Tyler’s M-estimate [23] and the 
Ledoit-Wolf shrinkage estimator [11]. This estimator accounts 
for the scarcity of samples, even allowing N > n, and exhibits 
robustness to outliers or impulsive samples, e.g., elliptically 
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distributed data. It is defined as the unique solution to the 
following fixed-point equation for p G (max{0,1 — n/N}, 1]: 


Cst(p) = (1 - p)- 




7^1 ^xfCg^(p)xt 


pi 


■N 


(3) 


where = x* - ^ Xi- 

Since with probability one the Xj are linearly independent, 
Cst(p) is almost surely defined for each N and n [29, 
Theorem III.l]. The corresponding GMVP selection is 


hsT(p)= 

with realized portfolio risk 


^ST(p)^tV 

l^CsT(p)lAr 


(hsT(p) 


L^CgT(p)CArCgT(p)lAr 




ip^Ny 


(4) 


Our goal is to optimize p online such that (4) is minimum. 
However, since (4) involves which is unobservable, this 
equation cannot be optimized directly. Also note that the naive 
approach of simply replacing Cn with Cst(p) in (4) would 
yield the so-called “in-sample risk”, which underestimates the 
realized portfolio risk, leading to overly-optimistic investment 
decisions [33]. We tackle this problem by obtaining a consis¬ 
tent estimator for a scaled version of the realized risk (4) as 
n and N go to infinity at the same rate. Contrary to classical 
asymptotic theory for time series analysis and mathematical 
statistics, which typically deal with the case of N fixed and 
n —> oo, a double-limiting condition is of more relevance for 
large portfolio problems, where n is comparable to N. To this 
end, following [33], we first derive a deterministic asymptotic 
equivalent of (4) and then provide a consistent estimator based 
on this. 


B. Deterministic equivalent of the realized portfolio risk 
For our asymptotic analysis, we assume the following: 

Assumption 1. 

a. Ai N,n ^ oo, N/n = cat —>■ c € (0, oo). 

b. The Tt, t = are i.i.d. Ti,...,r„ > ^ a.s. for some 

^ > 0 and E[ti\ < oo.' 

c. Denoting 0 < Ai < ... < A^v the ordered eigenvalues 

of Ctv, as N,n —>■ oo, vn — satisfies 

vpf —> v weakly with v 5 q almost everywhere. In 
addition, limsupA? A^r < oo. 

We also introduce some further definitions, which will arise 
in our asymptotic analysis. For p € (max(0,1—c“'), 1], define 
7 the unique positive solution to 

1 = [ -- 7 ^-(5) 

J 7P+(1-P)i 

and 

’For technical reasons, made explicit in the appendix, we require the 
quantities it — ~ ^ EILi have controllable norms. This 

imposes the constraint rt > ^ > 0 which might be possible to relax at 
the expense of increased mathematical complexity. 


The following theorem presents our first key result: a deter¬ 
ministic characterization of the asymptotic realized portfolio 
risk achieved with Cst(p)- 


Theorem 1. Let Assumption 1 hold. For e G (0, min{l, c '}), 
define TZ^ = [e -I- max{0,1 — c“'}, Ij. Then, as N,n ^ 00 , 


sup 

pGTZe 


cr^ (hsT(p)) - 0 - 2 (p) 


(6) 


where 
-2 


^ 7 ^ - /3(1 - p)2 




-I- pljv^ Cat ^4^Cat + plAf) Iat 




-c^ + pi 


N 


-1 


'-N 


Proof: See Appendix B. 


Remark 1. In Theorem 1, the set TZg excludes the region 
[ 0 ,£ -f max{ 0,1 — c“'}). Ai we handle the uniformity of the 
convergence (6), the proof of Theorem 1 requires us to work on 
sequences {pn}^]^ of p. It is however difficult to handle the 


limit 

Pn '' 


(T^(hsT(Pn)) - far a sequence {p„}^i with 

0. This follows from the same reasoning as that in [32] 
(see Equations (5) and (6) in Section 5.1 of [32] as well as 
Equation (12) in Appendix A where p„ —)■ po > 0 is necessary 
to ensure e"*" < 1). In the subsequent results, p G TZg is also 
required for the same reason. 


Theorem 1 enables us to analyze the convergence of the 
realized portfolio risk in the regime of Assumption 1-a for 
hsT(p). In order to calibrate the shrinkage parameter p for 
optimum GMVP performance, only the available sample data 
and certainly not the unknown Cat can be used. This is the 
objective of the subsequent section. 


C. Consistent estimation of scaled realized portfolio risk 

Based on the observable data only, we can obtain an 
estimator of a scaled version of the realized portfolio risk, 
CT^(hsT(p))/K, where we define k = J tv(df). We begin with 
the following lemma that provides a consistent estimator of 
7, scaled by 1 /k, which is denoted as (“sc” standing for 
“scaled”). 

Lemma 1. Under the settings of Theorem 1, as N,n ^ 00 , 
sup | 7 sc - i/k\ 0 (7) 

pGTZe 

where 

- _ _ \ _ 4 \ " faj ^ST(p)Xt 

l-(l-p)cArn^ llxtP 

Proof: See Appendix C. 

The following theorem provides a consistent estimator of 
CT^(hsT(p)), scaled by 1 /k, which is denoted as af^p). This 
is our second main result. 
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O-sc(p) 


1w^st(^') (Cst(p) Cgrp(p)l]v 

{l-p)-{l-pyCN (l^Cg^(p)lAr)2 


( 8 ) 


Theorem 2. Under the settings of Theorem 1, as N,n ^ oo, 

1 


sup 

p^'R-e 


'^sc(p) - -cr (hsT(p)) 


0 


where u‘l^{p) is defined in (8) at the top of the page. 


Proof: See Appendix D. 

Note that, since k is independent of p, the same p minimizes 
both (T^(hsT(p)) and CT^(hsT(p))/K. 

The following corollary of Theorem 2 is of fundamental 
importance, which demonstrates that choosing p to minimize 
(Tg(.(p) is asymptotically equivalent to minimizing the unob¬ 
servable CT^(hsT(p))- 


Corollary 1. Denote p° and p* the minimizers of d^^{p) and 
CT^(hsT(p)) over TZ^, respectively. Then, under the settings of 
Theorem 1 and Theorem 2, as N,n ^ oo. 


(hsT {p°)j - (^hsT (p*)) 


0 . 


Proof: See Appendix E. 

With this result, the GMVP optimization problem is now 
reduced to the minimization of a‘l,,{p), which can be done 
with a simple numerical search. 

To summarise, given n past return observations of N assets, 
our proposed algorithm to construct a portfolio with minimal 
risk can be described as follows: 


Algorithm 1 Proposed algorithm for GMVP optimization 

1) Compute the optimized shrinkage parameter via a numer¬ 
ical search 

p° = argmin {ctsc(p)} • 

pG [£+max{0,l —c“^},l] 

2) Form the risk-minimizing ST estimator C 3 X’ unique 
solution to 

1 ~ ~T 

CsT = (1 - P°)- 51 

3) Construct the optimized portfolio 


Vi^ — 

rigT — 


Co—1 1 

SX 

iTr^o-l-i 


IV. Simulation results 

We use both synthetic data and real market data to show 
the performance of Cgrp compared to the following competing 
methods: 

1) Cp, referred to as the Abramovich-Pascal estimate from 
[32]; 

2) Cc, referred to as the Chen estimate from [32]; 


3) Cc 2 , the oracle estimator in [31], which has the same 
structure as Cc, but resorts to solving an approximate 
problem of minimizing the Frobenius distance to find the 
optimal shrinkage; 

4) Cpw^ the Ledoit-Wolf shrinkage estimator in [11]; 

5) Cr, the Rubio estimator proposed in [33], which has the 
same structure as Cpw, but with p calibrated based on 
the GMVP framework, as in the present article. 


A. Synthetic data simulations 

The synthetic data are generated i.i.d. from a multivariate 
Student-T distribution, where = \/d/x% d = 3 and Xd ts 
a Chi-square random variable with d degree of freedom. We 
set N = 200. The mean vector /x can be set arbitrarily since it 
is discarded by the empirical mean, having no impact on the 
covariance estimates. We assume the population covariance 
matrix Cjv is based on a one-factor return structure [35]: 
Cjv = bb^CT^ + S, where cr = 0.16. The factor loadings 
b G are evenly spread between 0.5 and 1.5. The residual 
variance matrix S G is set to be diagonal and 

proportional to the identity matrix: S = cr^I, where Ur = 0.2. 

Fig. 1 illustrates the performance of different estimation 
approaches in terms of the realized risk, averaged over 200 
Monte Carlo simulations. The risk bound is computed by (2), 
the theoretical minimum portfolio risk. Compared to other 
methods, our proposed estimator Cgrp achieves the smallest 
realized risk for both n < N and n > N. We omit the realized 
risks achieved by Cjv = In as they are uniformly more than 
five times as large as those achieved by the other methods. 

It is interesting to compare the optimized p of Cgrp and 
Cp. They are both solutions of (3), but with p optimized 
under different metrics: minimizing the risk and minimizing 
the Frobenius distance, respectively. As shown in Fig. 2, the 
optimal shrinkage parameter varies under different metrics. 
Interestingly, optimizing p under the risk function as opposed 
to the Frobenius distance leads to more aggressive shrinkage 
(regularization) towards the identity matrix, thus producing a 
portfolio allocation which is closer to the uniform allocation 
policy. 


B. Real market data simulations 

We now investigate the out-of-sample portfolio performance 
of the different estimators with the real market data. We 
consider the stocks comprising the HSI. In particular, we 
use the dividend-adjusted daily closing prices downloaded 
from the Yahoo Finance database to obtain the continuously 
compounded (logarithmic) returns for the 45 constituents of 
the HSI over L = 736 working days, from Jan. 3, 2011 to 
Dec. 31, 2013 (excluding weekends and public holidays). 

As conventionally done in the financial literature, the out- 
of-sample evaluation is defined in terms of a rolling window 
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Fig. 1. The average realized portfolio risk of different covariance estimators 
in the GMVP framework using synthetic data. 



n (N=200) 


Fig. 2. The optimal shiinkage pai'ameters of Cgrp and Cp in the synthetic 
data simulation. 


method. At a particular day t, we use the previous n days (i.e., 
from f — n to f — 1) as the training window for covariance esti¬ 
mation and construct the portfolio selection Hgmvp- We then 
use hcMVP to compute the portfolio returns in the following 
10 days. Next the window is shifted 10 days forward and 
the portfolio returns for another 10 days are computed. This 
procedure is repeated until the end of the data. The realized 
risk is computed conventionally as the annualized sample 
standard deviation of the corresponding GMVP returns. In our 
tests, different training window lengths are considered. 

Fig. 3 shows that the proposed Cgrp achieves the smallest 
realized risk. It outperforms the other methods over the entire 
span of considered estimation windows. The realized risk 
achieved by C^r = Ijv is also omitted here because it is 
more than double as those achieved by the competing methods. 
When the estimation window is too long (e.g., greater than 320 
days), we observe that the performance starts to systematically 
degrade. This is presumably due to a lack of stationarity in the 
data over such long durations. This highlights an interesting 
phenomenon worthy of further consideration, but a detailed 
study falls beyond the scope of the current contribution. 

When the estimation window length is 300, the lowest risk 
is achieved by Cg^j,. Table I presents the risks obtained by 


the different covariance estimators at the optimal estimation 
window length of 300. We also test whether the pairwise 
differences between the portfolio variance achieved by Cg^, 
and each benchmark strategy are statistically different from 
zero. Since standard hypothesis tests are not valid when returns 
have tails heavier than the normal distribution or are correlated 
across time, we follow the method described in [36] and 
[37] and employ a studentized version of the circular block 
bootstrap [38] to do the test. The p-values are computed under 
the null hypothesis that the portfolio variance achieved by 
a particular benchmark covariance matrix estimator is equal 
to that achieved by Cgrp. We use a block length b = 5 
and base our reported p-values on 2000 bootstrap iterations. 
We also compute the p-values when the block lengths are 
b = 1 and b = 10. The interpretation of the results does 
not change for 6 = 1, 6 = 5, or 6 = 10. This implies that the 
temporal correlations of the stock returns are weak and our 
i.i.d. assumption on the data is acceptable. In the row reporting 
the risks, statistically significant outperformance of Cgrp over 
other methods is denoted by asterisks: ** denotes significance 
at the 0.01 level (p < 0.01) and * denotes significance at 
the 0.05 level (p < 0.05). It can be seen from Table I that 
the outperformance of our proposed method is statistically 
significant, with p < 0.05 in all cases. 

As a further comparison to investigate the performance with 
hner temporal resolution than that in Fig. 3, we carry out 
a rolling-window analysis on the realized risks. Under the 
optimal estimation window length of 300, we obtain 436 out- 
of-sample portfolio returns. From the start of the data, we use 
the most recent 70 out-of-sample portfolio returns to compute 
the (annualized) standard deviations of the GMVP. Shifting 
one day forward, we repeat this procedure until the end of the 
portfolio returns. For each covariance matrix estimator, this 
results in 367 risk measurements, which are then displayed 
in a time series plot. Fig. 4. We hnd that 69.2% of the time, 
Cgrp achieves the lowest risk among all alternative methods. 
In addition, during the period of high volatility, that is, when 
230 < t < 300, Cgrp exhibits the greatest outperformance. 
This justihes that our proposed GMVP optimization strategy 
is robust to market fluctuations and even possibly to outliers. 

V. Conclusions 

We have proposed a novel minimum-variance portfolio 
optimization strategy based on a robust shrinkage covariance 
estimator with a shrinkage parameter calibrated to minimize 
the realized portfolio risk. Our strategy has been shown to be 
robust to hnite-sampling effects as well as to the impulsive 
characteristics of the data. It has been demonstrated that our 
approach outperforms more standard techniques in terms of 
the realized portfolio risk, both for synthetic data and for 
real historical stock returns from Flong Kong’s HSI. Although 
we base our analysis on the assumption of the absence of 
the outliers, a recent study [39] has shown that the robust 
covariance estimator Cst is resilient to arbitrary outliers 
by appropriately weighting good versus outlying data. This 
is somewhat confirmed by our real data tests and is worth 
investigating further. 















TABLE I 

Realized portfolio risks (annualized standard deviations) and the corresponding p-values under dieferent covariance matrix 

ESTIMATORS. 
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Dataset 

Statistic 


Cp 

Co 

Cc2 

Clw 

Cr 

liV 

HSI 

Risk (n=300) 

0.0419 

0.0433** 

0.0428* 

0.0430* 

0.0438** 

0.0439** 

0.1112** 

p-value 

1.000 

0.009 

0.028 

0.041 

0.001 

0.001 

0.000 



Fig. 3. Realized portfolio risks achieved out-of-sample over 736 days of 
HSI real market data (from Jan. 3, 2011 to Dec. 31, 2013) by a GMVP 
implemented using different covariance estimators. 



portfolio optimization. These considerations are left to future 
work. 


Appendix A 
Preliminary results 


In this appendix we provide some preparatory lemmas that 
are essential for the proof of the main theorems. From now 
on, for readability, we discard all unnecessary indices p when 
no confusion is possible. 

We start by rewriting Cst in a more convenient form. 
Denoting 

t = l,2,...,n, 

with y/r = (yTr,..., ^nd Z^r = [zi, ...,z„], after some 

basic algebra, we obtain 



■^ST 


n 


-.J1 




■ pljv- 


Denoting C(j) = Cst - (1 - p)^ 

{A+rvv'^)~^v — A~^v/{1+rv’^A~^v) for positive definite 
matrix A, vector v and scalar r > 0, we have 


N 


jTp-l 

'-^ST 


Zt = 


J_sTp-l 


Zt 


1 + (1 - p)cN- 


(t) 




SO that 


N 


= (1- (l-p)CAr)^Zj^ C 


Trx-l. 


(9) 


and we can rewrite Cst as 


Fig. 4. Annualized rolling-window standard deviations of the most recent 70 
out-of-sample log returns for the GMVP based on different covariance matrix 
estimators. 


Even though GMVP is not an optimal portfolio in terms of 
the Sharpe ratio or return maximization at a given level of risk, 
many empirical studies [40,41] has shown that an investment 
in the GMVP often yields better out-of-sample results than 
other mean-variance portfolios, because of the poor estimates 
of the means of the asset returns. Therefore, besides the robust 
estimation of the covariance matrix, it would be of interest 
to take into account the robust estimation of the means and 
further develop robust approaches to the various portfolio 
optimization strategies that involve both the estimates of the 
means and the covariance matrix of the asset returns, such 
as Sharpe ratio maximization or Markowitz’s mean-variance 


Cst = 


1-p 


1 - (1 - p)cN 


Et 


Ztz/ 




pi AT. 


For t G n}, denote dt{p) = The 

following lemma gives a deterministic approximation of dt{p), 
which later helps to show that, up to scaling, Cst is somewhat 
similar to z^tZ^, which is not observable. 


Lemma 2. Under the settings of Theorem 1, as N^n ■ 


oo, 


sup max 

pGTl, 


Mp) - i{p) 


0 . 


Proof: This is proved via a contradiction argument, which 
follows along lines similar to the proof in [32]. The main dif¬ 
ference lies in that we re-center the sample data by subtracting 
the sample mean, while the samples are assumed to be zero 
mean in [32]. By subtracting the sample mean, the re-centered 
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data are correlated and some terms still remain in Cst, 
which introduces new technical difficulties. 

Assuming (by relabelling) that di{p) < ... < dn{p), we 
first prove that for any fixed £ > 0, dn{p) is bounded above 
by 7 (p) + f for all large n, uniformly on p € TZ^. Since 
U V ^ ^ U“^, for positive definite matrices U and 

V, we obtain 


dn{p) = ^z„ 7- 7- - 7 -T7T 

N \l-{l-p)cNn‘^dt{p) 


■ pi AT Zn 


1 

- 


I- p 


■-E 

n —1 ~ 


1 - (1 - p)CN n ^ dnip) 

Since z„ 7 ^ 0 with probability one, this implies 




1- p 


n—1 


1 - (1 - p)cN 


E ZtzJ + dn{p)plN 




Assume that there exists a sequence over which 

dn{Pn) > liPn) + ^ infinitely often, for some fixed f > 0 . 
Since {pn}^=i is bounded, it has a limit point pg G TZ^. Let 
us restrict ourselves to such a subsequence on which p„ —> 
po > 0 and dn{pn) > liPn) +1. On this subsequence, from 
( 10 ), we have rfiM.n > 1 , where rfiN.n = ;yzjMAr_„z„ and 

ELl Ztzf + (7(Pn) + ^)PnlA' 


MjVn = f - 

The quadratic form mM,j is amenable to large random 
matrix analysis. The first step is to remove the effect of the 


sample mean. Denote rriNj = 

( _Li£=_ 

particular: 


1 „T 


Ma 


and Mjvj = 


h^t^j + (7(Pn)+ ^)pnljv) ■ We have in 


Proposition 1. As N,n ■ 


max \mNj — ttinjI 

l<j<n 


0 . 


Proof: See Appendix F. 


rriN.i 


1 - (1 - po)c 

1 - Po 


^(-(7(Po)+^)po ^ l-po^°^0 


A + 
= rrr , 


where, for x < 0, 5{x) is the unique positive solution to 

t 


S{x) = J 


-X + 


fdt). 


1 +C( 5 (ai) 


Together with |m 7 v,n — tunF 0, we have 

I ~l“ I ^ • S. ^ 

|^iV,n — I -^ U. 

It was demonstrated in [32] that mF < 
contradiction with niM.n > 1 - 

Now assume pg = 1. According to [32], 

a.s. 1 
'^N,n ^ ^ ^ 

Then 

1 


(13) 

1. But this is in 


< 1 . 


'^N.n 


l + £ 


0 , 


Zr7, • 


( 10 ) 


but < 1 , again raising a contradiction with rfiN.n > 1 - 
Hence, for all large n, there is no converging subsequence 
of p„ (and thus no subsequence of p„) for which dn{pn) > 
l{Pn) +£ infinitely often. Therefore fi„(p) < 7 (p) + £ for all 
large n a.s., uniformly on p € TZ^. 

The same reasoning holds for di{p), which can be proved 
greater than ^{p) — £ for all large n uniformly on p G TZg. 
Following the same arguments in [32], since f > 0 is 
arbitrary, from the ordering of the dt{p), we have proved that 
stiPpe-R^ maxi<t<„ dt{p) --f{p) -^ 0 . ■ 

The following three lemmas. Lemma 3, 4 and 5 show 
that functionals of Tyler’s estimator asymptotically perform 
similar to functionals of i or i ytYt- They 

are used as an intermediate step for the development of the 
asymptotic deterministic equivalent of the risk function. Using 
existing results in [33], quoted as Lemma 6 in this paper, we 
can then obtain our main theorems. 

For notational convenience, we denote k = k{p) = 
i_(]Ep)e - Also recall that 7 is the unique positive solution 
to 1 = / Assuming An G is a 

deterministic symmetric nonnegative definite matrix, for some 
[ 0 , 00 ) if liminfAT Ai(AAr) > 0 
[p, 00 ) otherwise 

further define that, for p G TZ^ and w G T>, 


(11) V > 0^ define I) = 


and 


Remark 2. In Proposition 1, Assumption 1-b is necessary; 
that is, i.i.d. Ti,..., Tn > ^ a.s. for some ^ > 0 and E[ti\ < 00 . 
It guarantees that for t = 1, ...,n, the norm of Zf does not go 
ojf to infinity, recalling that z^ = Z( — 


Rat = I A 


+ a-rf;i:T 


N 


XtX/ 




■ wl 


N 


Sat = I A 


N 


1 ^ 


7 n 


_ ..j' 

ZtZt ■ 


- wl 


N 


By Proposition 1, we have |mAf,n ~ WAr.nl 0- This 
allows us to follow the proof in [32], which deals with data 
with mean zero. 

To proceed, assume first pg 7^ 1. From the proof of Theorem 
1 in [32], 


t=i 


Sw = I A 


N 


k 1 

H-/ : 




■ wl 


N 


Then we introduce the following lemma. 

Lemma 3. Assume sln G is a deterministic vector with 
limsupAT li^tvlP < 00 . Under the settings of Theorem 1, as 
A, n —>■ 00 , 


( 12 ) 


sup 

P^TZ£,w^T) 


a^RAraAT — SlnSn^n 


0 . 


(14) 


Proof: Define 

BAr(p) = 


k 1 


7(p) 


71 
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n ~ 

re 


(a) 


1- p 


n 

-E 

ri 


^ ^rp 

ZtZt 


l-(l-p)cArn^dj(p) 
where (a) uses the identity (9). Denote 
A = a^RjvaAf ~ ^'n^n^n 

a^Rjv ^BAr(p) — DAr(p)^ SArajv 

where (a) uses the identity that U“^ — V“^ = U“^(V — 
U)V~^ for invertible U,V matrices. We first prove that as 
N,n-)-oo, suppg 7 ^^_„gx) |A| 0 . 

As N,n ^ oo, using the definition of k. 


sup 


^n{p) — BAr(p) 


< 


n 




sup max 
pe7^e 1 - (1 - p)c 


I-p 


dt{p) - lip) 


l{p)dtip) 


1 


n 




l” l” ^ 


n-^ 

1 

n 


til 


t^i 


t=i 

z ^ 

n y/n 


n—Vn 


V^t 


n VA 


1 


E 


zj —Zp—— 

n y/n 


E 


n i/n 


tlt^] c4lty>-yn 


t^l 


\ t=l 

y* ' 




t=i V '■ / \ t=l 

By the law of large numbers, as N,n —>■ oo, 

T 


— I I — /El — 

^ Va / I n ‘ / 


0 . 


lim sup„ 


V" (- 

n ^t—1 \ n 




Vn 


Si=l (n^^E) 

< c IlC^vlj < oo a.s. 


For the fourth term, with Assumption 1-b, we have 

limsup||-f^ (-Zn^ 

\n y/Tt J \n y/n 


< lim sup 


1 


1 

n 


ZJ^Zn 


t^)(lt 

iT 


1 

Tt 


<oo a.s. 


Therefore, limsup„ || ^ X]”=i II A Together with 

Lemma 2, from (15), we have 

|B7v(p) — DAr(p) 


sup 

P^TZe 


0 . 


(17) 


Note that w GV ensures limsupjv sup^g^^^ ,„gx> 
oo and limsupjv sup^g.;^^ ,„gP 


Together with (17) and Ha^v 


R 


■N 


< 


< OO. 


< oo, we have 


sup |A| < ||aA,f sup 

p^TZe.'UJ^D p^TZe,w^T) 


(15) 


We will show that the RHS of (15) goes to 0 a.s. 
Recalling Lemma 2, this follows upon showing that 
limsup„ II T ZiZ^II < oo a.s. To this end, recall that 
it = 1 . 1 - ^ZNy f^- Then 


X sup 

pGTZe 


kl 

7 n 


Rat 

sup 

pGTZe ,wGT> 

Sw 

n 

' ~T 1 




a.s.^ 

t=l N^I^ST^t 



0 . 


Following the same reasoning as that of Proposition 1, we 
have 


sup 

p^TZe,'W^T) 




0 . 


Together with sup^g^^^ ,„gx) |A| A/j. q, we obtain (14). ■ 

Define Wat = (a^v + 7 7 YiY?’+ and 


Wat = ( Aat + (1 - p)7 ELi T 


ytvt 


Yf '-'AT ^ST^N Y* 


wl 


N 


(16) 


where y* = yt — 7 Er=i y*- introduce the following 
lemma. 


We will show that the spectral norm of each term on the RHS 
of (16) is bounded for all large n a.s. 

First, from Assumption 1-c. and [42], we have 
limsup„ II T E"=i ^t^ni ^ a.s. Next, for the second and 
the third terms on the RHS of (16), 


Lemma 4. Under the settings of Lemma 3, as N,n ^ 00, 


sup 

p^lZeiW^T) 


a^WAra^v — a^WA^aAr 


0 . 


(18) 


Proof: The derivation is similar to that of (14). 


Lemma 5. Under the settings of Lemma 3 and assuming 
A AT = 0, as A, n —)■ 00, 


sup 

p&TZc:,w^[rj,oo) 


^w( 1 -p)-Et 


T 

^ S ]y*f 


rp k \ Tci2 

~^N -/ , Sjyfajyi 

ry Tt f ^ 


7n-i 


Rat^W 
AA 0 . (19) 


Proof: We first notice that 


i=l 




According to Assumption 1-a and Assumption 1-c, we 

111 -^^ 1 ' 
can see that limsup^ " 


d 

dw 

d 


i=l N^i 


— wa'^'Rjyfaiyf 

T ^ f T ^ 

IjyRAraAf + (^^Af^JV^-AfJ • 
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Following similar steps, we also have 

= aJfSNSLN + W^ (a^SAraAf) . 

^ n ( ^ r\in ^ ' 


The almost sure convergence (14) in Lemma 3 when extended 
to re € C is uniform on any bounded region of (C — M) U T), 
and the functionals of w in (14) are analytic. Thus, by the 
Weierstrass convergence theorem [43], the following holds: 






N^t 




sup — 

pGKe,wGV aw 


(a.Jj^Na.N^ — ^ {o-Jj^nO-n) 0. 


X 

_ ^ ^ -\T r^~^/‘^w7‘ 

““d^ Itv 


Setting An = and a^ = in (20), as 

well as A AT = pCjj^ in Wat, yields 

sup ^iIC-^/^WnC-^^^1n 


Together with Lemma 3, we obtain (19). ■ 

Lemma 6 . [33, Appendix I-B] Under the settings of Lemma 3, 
as N,n ^ oo, 

sup [a^SAraAT - a^TAraArj 0 (20) 

pGTZe ^ujG'D 

where Tat = (^Ajv + (..y+e^(^,)fc) Cat + wIat) and, for 
each w G T>, eN{w) is the unique positive solution to the 
following equation: 


1 / ^ 

eAr(w) = -tr Cat Atv + 7—;-——— Cat+wIat 

n [ V [j + eN{w)k) 

Moreover, when An = Q, we have 


T K V v-^ T 02 

sup ajv-^ ztZt SjyaAf 

pG7?.g.,tite[Tj,oo) "7 ^ 

^7 3-^CArT^a.Ar a.s.^ ^ 

(7 + eiv(w)fc)^ , fc2 1. rp2rp2 1 

(l+^N(.™)kp n ^ 

Appendix B 
Proof of Theorem 1 

First consider the (re-scaled) realized portfolio risk: 


pGT^ej'UJGfOjOo) 


1 1 T ri-l/2 T r^-l/2i 

~ff^N'^N aN'^N 


0 ( 22 ) 


where Ja^ = (pC^^ + In) and for 

each w G [ 0 ,oo), eAr(w) is the unique positive solution to 
the following equation: 


ejvM=^tr (pCf/ + ( +w]1n 

n y\ \^ + eN{w)k ) 

Lemma 4 and the convergence (22) imply 
sup 


p^'R,e,'iv^[0,(X)) I 


1 1 T |-i —1/2- a.s.^ „ 


Following the same reasoning as for the proof of Lemma 5, 
the convergence of the derivatives holds such that at re = 0 
by the Weierstrass convergence theorem, 

pen, dw \N J 


d /I 


dw V N 


1 T — t / 2 I" — 1 / 2 -| 

J-AT 


Au 2 (hsT) = 


N dw^ST^NCgrplAT 


With Eq. (23) on the top of the next page and 
|eAf ( 0 ) — C 7 I 0 when A, n —>■ oo, we have 


For the denominator. Lemma 3 and Lemma 6 imply 

sup —lA^Cgrpljv —Of ^iv ( - ^Cn + pliv') In ~~t 0 . 

pen, A! N \ j ) 

Note that in this case. An = 0, aAr = ^^Iat and w = p, 

which leads to |eAr(p) — C 7 I 0 when N,n -G 00. The 
derivation is based on Assumption 1-c and the definition of 7 
in (5). 

For the numerator, we rewrite it as 

^lAf^ST^NCgrplAT 

_ 7 1 T /-I — 1/2 /y-t — 1/2 Ps p, — 1/2n — 2y~<—1/2.| 

— ^■’■At'-^AT y'^N '^ST^n ) ■*-N) 

which, upon substituting the RHS of (3) for Cst and setting 
An = pCff^ in Wat, yields 

^Iw^ST^NCgrplAT 


/v^^^^stCatCstIn ^2_^(i_p)2^1iv 


X ^ ^ ^ Cn + pljv^ Cat ^ ^P cAf + plAf^ In 


Equipped with the asymptotic equivalences of the denominator 
and numerator of (21), we prove Theorem 1. 

Appendix C 
Prooe oe Lemma 1 

Eirst notice that 

- 1 1 1 >7 xfCgj(p)kt 

T'- A||x,P 

_ _ 1 _ ^ST (P)^t 

l-(l-p)cAr An^ ^llziP 
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d 

dw 


1 


T —1/2 T 




N^N 


'-N 


w—0 




■*-lV 


1 - 


/c 2 


(7+eN(0)fe)^ ^ 


1 

-tr 


{(:■ 


r^2 _ 

N \ \ 7 +eiv( 0 )fc 




■ pJ-N 


(23) 


It has been shown in Lemma 


suPpeKe maxi<t<„ 


that 

pt(p) — 7 (p)| 0 , where 

Mp) = i-{i-p)c,, T!^tCsTip)zt- Therefore, to prove 
the convergence (7), it is left to show that ^||zi|p 2:^ ^ 

We start by writing 

-T 


1 


1 


1 




=]vh^‘-n^‘^-77-nV^ 


'^N^t 




1 rjT 7 V'T 




< limsup 


n 


TiJ^Zn 


A* I 
1 1 


Tt n 




Ti < OO. 


large n, we obtain 


N 


Zf - zt 


sup 

p^TZe 


1 

,2 


Iw^ST (CsT-plAf) Cgr|!,ljv - 


(l-p)-(l-p) 2 c 


7 


' 72 -/3(1 - p)2/V 


- -Cn + p1n] In 


0 . 


With respect to the asymptotic equivalence in (24) and upon 
substituting 7 sc for we obtain 


sup 
1 


1 

kN 


l^CgrpCjvCgrpljV ~ 


7s. 


(1 - p) - (1 - pPcn 


X^lwCgT (^CsT - pitv) Cgr^lN 


■i-ll 


0 . 


^ ^ -ZIZn^] . (25) 


n2 77 " 77 ^ 

Since the second and the third term on the RHS of (25) are 
the same, we analyze the second term only. It can be rewritten 
as 

1 1 rp y/r 1 1 on 11 Troftl 

- zTZn— = - zTzt + -zfZ7- 

^ n y/Tt Nn Nn ^/ft 

where Z^^ is the matrix with the fth column removed from 


Thus we obtain the consistent estimator of 2cr2(];ig.p7)) in 
Theorem 2. 

Appendix E 
Proof of Corollary 1 
According to Theorem 2, we have 
1 


sup 

p^'R-e 


^sc(p) - (hsT(p)) 
K 


0 . 


ZjY and is the vector with the tih entry removed 

from yPr. Since zt is independent of 12 ^) 


Then, the following holds true 

7^7^°) < ^sc(hsT(p*)) 

-(7^(hsT(p*)) < -0’^(hsT(p°)) 

K K 


777 -" 0- Together with ^zfzt = 0(1) a.s., we 

have jj^zfZN^ ^0. 

For the last term in (25), 

lim sup 


o-sc(P°) - ^cr7hsT(p°)) < sup 
« peKe 


0-sc(P*) - -0-^(hsT(p*)) < sup 

K p&TZe 

^ 0 . 


7c(p) - -cr (hsT(p)) 

tv 


^Icip) - - 0 -^(hsT(p)) 

tv 


0 . 


Thus, 

Since the last three terms on the RHS of (25) vanish with 


0. Therefore, as 


7 l|zt||‘ —>■ K, we obtain 7 llzt|P n and the convergence 

(7) unfolds. 

Appendix D 
Proof of Theorem 2 

According to Lemma 5 and Lemma 6 , in which we set 
Aff = 0, w = p and ajv = the convergence (26) at 

the top of the next page holds. 

As |eAr(p) — C 7 I —4 0 when W, n —>■ 00 , we substitute 07 
for e 7 v(p) in (26), giving 


These four relations together ensure that 
72(hsT(p°))-a2(hsT(p*))| 
Appendix F 

Proof of Proposition 1 

Denote 

\fhN,j - mN,j\ = |- A- i7 + C'-Zl|, 
where 1 < j < n and 

^lly^ T,^ 1 7^ 

O — —- —Zj^NLnJ-^N—— 

N n ^ n 77 


D^j^zjMr,,, 


1 pn 


1 - (1 - pn)cN n \ n 


1 1 ’spz ^ ^ Z 

- -^} Zjn—j=—j= 




z?)-zH Miv.i7. 
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sup 

pGT^e 


il?;Cs^(CsT-pIiv) 


^ST-*-iV • 


/c7 


^l^CiV ({^+e7p)fe)‘^^ + 


pljv) 


(7 + e]v(p)fc)2 


1 - 


fc 2 

(7+ejv(p)fe)^ 


-tr 




(7 + eiv{p)fe) 


Cjv + pIat^ 


(26) 


We wish to prove that 

E[\mN,j - mN,jf] < ^ (27) 

for some constants p > 1 , where Kp depends on p but not 
on N. Then, taking p > 2, along with the union bound, the 
Markov inequality, and the Borel-Cantelli lemma, completes 
the proof of Proposition 1. 

Using the Minkowski inequality, we have 

E [\fhN,j - totvjT] < 

{E^/P[\A\P] + E^/P[\B\P] + E^/P[\C\P] + E^/P[\D\P])P. 

Thus, to prove (27), it is enough to show that i7[|yl|P] < 
E[\B\P] < %L, E[\C\P] < ^ and E[\D\p] < 


For term Ai, 

Ai = — — 
JVP nP 




1 1 

NP 

Note also that 




iliPn) + i)Ppn 

and by Minkowsky’s inequality, 


N 


miYiin = EiY^yhr < NPE\y^J^P < KpNP. 


A. Moments of \A\, \B\ and \C\ 
Start by noting that 


E[\A\p] = ^—E 
^ NPnP 


zj Mjv^jZjv^= 


Ti JT. 


= ——E 
NP nP 


zj Mnj 


rO) 

-‘N 


J v ‘J 
7"(i) 


p/2 


Ve 


X I Zj + z7 I Matj-Zj 


Ve 


p/2 


= ——E 
NP nP 


Zj-zJ+Zj 


rU)T 

-‘N 


(,)VrU)__y (^-)VrUWrU) \ „ 


p/2- 


(a) 

< Xli + 2 I 2 + 2 I 3 + 2 I 4 , 

where (a) follows from Jensen’s inequality and 

|P/21 


4 P/ 2 -I 

Ai = -^^:—E 


A 2 = 


NPnP 

4P/2-1 

NPnP 


zjMArjZjzjMwjZjl 


E 


zJm^,,z7 


s/tA) 


4P/2-1 
~ NPnP ^ 


4P/2-1 
~ NPnP ^ 


p/2- 

X - 

zJ Matj z 7 MatjZj 


-Z^)^M^,,z, 


V'Pj 


p/2 


zJMatjZj 


-‘N ^^N,j2,j 


p/2- 


Thus 


^ 1 Kp\\Cm\\Hp/^-^^^ ^ KpA, 

^ - NP ( 7 (p„) + e)PpPn - NP 

Now consider 2 I 2 : 

p/2 


(a) 93p/2-3 

2 I 2 < 7- ( E 

- NPnP 

N) K„ ^ 


< 


< 


NPnP 


Kp 

NPnP 

Kp 

NPnP 


jzjQivZj - tr (Qiv)| + i? j^|tr (Qiv)|’’^^j 
+ £;|2i,jrtr [(QivQ?j)^/^] +£;|tr(Qjv)|J’/"] 

+ E\zij\P + 1 ) i7||Mjv.,Z^'^r 
(^EPE\,,j^ + E\z^jr + l) 


X £ HIM 


‘■N,j 


riiCivir^" 


rU) 


< 


j KpK^J^(EPAi,^^^f + E\z,JP + 1 ) 


NP (7(P«) + eypn 

< KpaJNP, 

U) Vt7') \/t7') ^ tyij)T 


Iy^) 


\/ t 0 )| 


where Q^r = Mat^Z; 


Z^-^ Mat,/, (a) follows 


^ VN 

from Jensen’s inequality and (5) follows from the trace lemma 
[44, Lemma B.26]. 

For XI 3 , 


,/2 

- NPnP 




U) 




E 


1/2 


Zj Miv.jZj 


p- 


As we have Ai < KpA^/NP and A 2 < KpA^/E'^, we 
obtain A 3 < KpA^/NP. Following the same reasoning as for 
A 3 , we also get A 4 < KpA^/NP. 

Therefore, we obtain 

E[\A\P] < Ai + A 2 + A 3 + A 4 < KpA/NK 
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Di 

D 2 

D 3 




1 Pn 1 

1 - (1 - Pn)cN n 

1 Pn 1 

1 - (1 - Pn)cN n 

1 Pn 1 

1 - (1 - pn)cN n 




The same reasoning holds for i^dSl^], giving _E[|i3|^] < 
Kps/NP. As for the moments of \C\, it is similar to how we 
dealt with Ai: 


men < 




^ iic^^r 

- NP^iPn) + t}Pmn 
^Kpc 

- NP ■ 

B. Moments of \D\ 

We denote -f= = {-)=..... —)=) 


E 


’ 1y ^ 

2 p- 

n ^ y/Tj 



and rewrite D as 


D = Di + D 2 + D 3 


with Di, D 2 and D 3 at the top of the page. We aim to 
prove < Kpo/NP, which is achieved by prov¬ 

ing E[\Di\P] < KpojNP, E[\D 2 \p] < KpojNP and 
E[\D3\p]<KpdJNP. 

Let’s first analyze Di with the analysis of D 2 and D 3 
following similarly. We obtain 


E[\D,\P] 

W ^ ( 1-Pn Y 

- NP yi-ii-pncNj 

X {Ee^P[\Dnm + EE^P[\Dn^P])PEEn\Dnm (28) 

where (a) follows from the Cauchy-Schwarz inequality, (6) 
follows from Minkowsky’s inequality, and 

/ l~~ \ ^ T 

E\a=\^] - 

\ n n2 V? 

Dih = -s/t (zj - 

n \ n yfTj j 


, _ _ , '~£' rp 

Did = ( — 'Zn—=.] Matj— —^Zn^/t. 

Our aim is to prove that i?[|£)if,|2p] < Kph, £^[|DicpP] < Kpe 
and E[\Did\'^P] < Kpd- Following the analysis of and 

L;[|C'|p], we obtain E[\Du\^p] < Kpb. For E[\Didn], we 
have 


E[\D,dn 
1 


< 


< 


n^PrP 


1 

4p 


2p „ 

1 ^ 

Ap 


E 

Mtvj- 


-YnVt 

n 



HCjvfP 

n^Prf {j{pn) + ^ypp^n 


1 

Ap 

1 ^ 

E 

-YatVt 



n 


4p 


< Kpd- 


Let us now establish the inequality for Die- We can see 
that Zj is not independent of MAr,j, thus we cannot follow 
the same procedure as for our analysis of A to determine the 
order of Instead, we divide M^v j into two parts, 

one that is independent of Zj and the other the remainder. 

We first write = E + F, where E and F are 

defined at the bottom of the page. Note that E is independent 
of Zj and F is not. Then Die can be rewritten as (29) at the 
top of the next page. Using Jensen’s inequality, 

E[\Dien\ < 22P-1 {E[\G\’^P\+E[\Hn\) , 

where G and El are the two terms on the RHS of (29). Next 
we can use the same technique as used in Appendix F-A to 
prove that £'[|GpP] < Kpc and E[|iJ|2p] < KpH- Therefore, 
we obtain E[|i9icp^] < Kp^. 

Thus far, we have proven that E[|i9if,|2p] < Kpb, 

E[\DiffP] < Kpe, and E^Did^] < Kpd- Coming back to 
(28), we obtain E[|L»i|p] < KpoJNP. 

Following similar arguments to our analysis of E[|I?i|p], 
we can also obtain i?[|i92|^] < Kpo^/NP and E[|i93|P] < 





13 


Die =z 


1 pn 


' 1 - (1 - p„)cjv n 

1 - p. 


1 1 ^ 1 ~ 

— E + (7(pn) + ^)pnljv 1 ~^~r^ —^Zjv\/t + ZjM]V,j 


\/r \/t 


1 Pn 


1 - (1 - pn.)CN n 


ip 


X 


1-(1 - pn.)CN 


1 \ 1 1 ^ 1 
— E + (7(p„)+£)p„Ijv I —=Zn'\/t- 

n / ijT yjT 


(29) 


Kpjj^/NP. As D = D 1 +D 2 +D:}, by Minkowsky’s inequality, 
we obtain E[\D\p] < Kpo/NP. 
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